Computer and Modernization

Previous Articles     Next Articles

A Speeding K-means Clustering Method Based on Sampling

  

  • Received:2013-09-17 Revised:1900-01-01 Online:2013-12-18 Published:2013-12-18

Abstract: To solve problems that traditional K-means clustering algorithm can not solve the large scale dataset clustering, this paper presents a speeding K-means clustering method based on random sampling, called Kmeans_RS clustering algorithm. The working set is selected from the original clustering dataset by random sampling and the traditional K-means clustering method is executed on this working set. Then the center and radius of every cluster is computed and the sampling result is obtained. The last clustering result of all dataset is obtained by measuring the relationship of sampling result and other data to cluster the remaining data. The random sampling way is used in this process and the size of K-means clustering is decreased, so the clustering efficiency is improved largely and it can be used to solve the large scale clustering problems. Simulation results demonstrate that the excellent clustering efficiency is obtained by this parallel speeding K-means method.

Key words: K-means clustering, random sampling, center, radius, working set, efficiency

CLC Number: